Dimension Reduction: A Powerful Principle for Automatically Finding Concepts in Unstructured Data

نویسنده

  • Holger Bast
چکیده

Dimension reduction techniques have been a successful avenue for automatically extracting the “concepts” underlying unstructured data, a task that naturally arises in fields as diverse as information retrieval, image processing, social science, etc. It is surprising how much can be achieved for this task using only the raw data itself, without resorting to any additional knowledge or intelligence. We will survey the most important schemes contributed from the various communities to date, by commenting on the following aspects: optimization techniques, the role of normalizations, setting the parameters, computing time, quality of results, and the integration of external knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Relations for Terminological Ontologies from Text

The problem of learning concept hierarchies and terminological ontologies can be decomposed into two sub-tasks: concept extraction and relation learning. We describe an new approach to learn relations automatically from unstructured text corpus based on one of the probabilistic topic models, Latent Dirichlet Allocation. We first provide definition (Information Theory Principle for Concept Relat...

متن کامل

Nonparametric Regression Estimation under Kernel Polynomial Model for Unstructured Data

The nonparametric estimation(NE) of kernel polynomial regression (KPR) model is a powerful tool to visually depict the effect of covariates on response variable, when there exist unstructured and heterogeneous data. In this paper we introduce KPR model that is the mixture of nonparametric regression models with bootstrap algorithm, which is considered in a heterogeneous and unstructured framewo...

متن کامل

Survey of Text Classification Technique and Compare Classifier

Huge amount data on the internet are in unstructured texts can‟t simply be used for further processing by computer , therefore specific processing method and algorithm require to extract useful pattern. Text mining is process to extract information from the unstructured data. Text classification is task of automatically sorting set of document into categories from predefined set. A major diffic...

متن کامل

Feature Dimension Reduction of Multisensor Data Fusion using Principal Component Fuzzy Analysis

These days, the most important areas of research in many different applications, with different tools, are focused on how to get awareness. One of the serious applications is the awareness of the behavior and activities of patients. The importance is due to the need of ubiquitous medical care for individuals. That the doctor knows the patient's physical condition, sometimes is very important. O...

متن کامل

Unstructured Overlapping Mesh Distribution in Parallel

We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004